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Abstract  The  performance  of  demand-driven  caching  is  known  to  depend  on  the 
locality  of  reference  exhibited  by  the  stream  of  requests  made  to  the 
cache.  In  spite  of  numerous  efforts,  no  consensus  has  been  reached  on 
how  to  formalize  this  notion,  let  alone  on  how  to  compare  streams  of 
requests  on  the  basis  of  their  locality  of  reference.  We  take  on  this  issue 
with  an  eye  towards  validating  operational  expectations  associated  with 
the  notion  of  locality  of  reference.  We  focus  on  two  “folk  theorems,” 
namely  (i)  The  stronger  locality  of  reference,  the  smaller  the  miss  rate 
of  the  cache;  (ii)  Good  caching  is  expected  to  produce  an  output  stream 
of  requests  exhibiting  less  locality  of  reference  than  the  input  stream  of 
requests. 

We  discuss  these  two  folk  theorems  in  the  context  of  a  cache  operat¬ 
ing  under  a  demand-driven  replacement  policy  when  document  requests 
are  modeled  according  to  the  Independent  Reference  Model  (IRM).  As 
we  propose  to  measure  strength  of  locality  of  reference  in  a  stream  of 
requests  through  the  skewness  of  its  popularity  distribution,  we  intro¬ 
duce  the  notion  of  majorization  as  a  mean  for  capturing  this  degree  of 
skewness.  We  show  that  these  folk  theorems  hold  for  caches  operating 
under  a  large  class  of  cache  replacement  policies,  including  the  optimal 
policy  Aq  and  the  random  policy,  but  may  fail  under  the  LRU  policy. 
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1.  Introduction 

Web  caching  aims  to  reduce  network  traffic,  server  load  and  user- 
perceived  retrieval  latency  by  replicating  “popular”  content  on  (proxy) 
caches  that  are  strategically  placed  within  the  network,  e.g.,  Wang  (1999) 
(and  references  therein).  This  approach  is  a  natural  outgrowth  of  caching 
techniques  which  were  originally  developed  for  computer  memory  and 
distributed  file  sharing  systems,  e.g.,  Aven,  Coffman  and  Kogan  (1987); 
Coffman  and  Denning  (1973);  Phalke  and  Gopinath  (1995)  (and  refer¬ 
ences  therein).  However,  the  exponential  growth  of  the  World  Wide  Web 
and  its  specific  circumstances  are  challenging  current  cache  architectures 
to  meet  the  complementary  mandates  of  speed,  scalability  and  reliability 
which  are  central  to  delivering  a  satisfactory  user  experience. 

Although  these  challenges  have  renewed  interest  in  caching  in  gen¬ 
eral,  some  basic  issues  are  still  not  well  understood.  Indeed,  the  perfor¬ 
mance  of  any  form  of  caching  is  determined  by  a  number  of  factors,  chief 
amongst  them  the  statistical  properties  of  the  streams  of  requests  made 
to  the  cache.  One  important  such  property  is  the  locality  of  reference 
present  in  a  stream  of  requests  whereby  “bursts  of  references  are  made 
in  the  near  future  to  objects  referenced  in  the  recent  past.”  The  impor¬ 
tance  of  locality  for  caching  was  first  recognized  by  Belady  (1966)  in 
the  context  of  computer  memory,  and  attempts  at  characterizing  it  were 
made  early  on  by  Denning  (1968)  through  the  working  set  model.  Re¬ 
cently,  a  number  of  studies  have  shown  that  streams  of  requests  for  Web 
objects  exhibit  strong  locality  of  reference1  (see  e.g.,  Jin  and  Bestavros 
(2000b);  Mahanti,  Williamson  and  Eager  (2000)).  Like  the  notion  of 
burstiness  used  in  traffic  modeling,  locality  of  reference,  while  endowed 
with  a  clear  intuitive  content,  admits  no  simple  definition. 

Thus,  and  not  surprisingly,  in  spite  of  numerous  efforts,  no  consensus 
has  been  reached  on  how  to  formalize  the  notion,  let  alone  compare 
streams  of  requests  on  the  basis  of  their  locality  of  reference.2  To  the 
best  of  the  authors’  knowledge,  this  lack  of  consensus  has  precluded  the 
formal  derivation  of  the  following  “folk  theorems” : 

1.  Folk  theorem  on  miss  rates  —  The  stronger  the  locality  of  ref¬ 
erence  in  the  stream  of  requests,  the  smaller  the  miss  rate  since  the 
cache  ends  up  being  populated  by  objects  with  a  higher  likelihood 
of  access  in  the  near  future.  Such  a  property,  if  true,  would  con¬ 
firm  the  central  role  played  by  locality  of  reference  in  shaping  cache 


1At  least  in  the  short  timescales 

2  An  exception  can  be  found  in  a  recent  paper  by  Fonseca  et  al.  (2003). 


Comparing  strength  of  locality  of  reference 


3 


performance.  In  fact,  the  very  presence  of  locality  of  reference  in 
the  stream  of  requests  is  what  makes  caching  at  all  possible;  and 

2.  Folk  theorem  on  output  streams  Good  cache  replacement 
strategies  “absorb”  locality  of  reference  to  a  certain  extent  by  pro¬ 
ducing  a  stream  of  misses  from  the  cache  -  its  so-called  output  - 
which  exhibits  less  locality  of  reference  than  the  input  stream  of 
requests.  In  the  context  of  multi-level  caching,  this  reduction  prop¬ 
erty  is  often  perceived  as  one  of  the  main  reasons  for  why  caching 
looses  its  effectiveness  after  some  level  in  a  hierarchy  of  caches. 

Such  folk  theorems  are  expected  to  hold  for  demand-driven  caching  that 
exploits  recency  of  reference.  Interest  in  establishing  them  under  a  spe¬ 
cific  definition  of  locality  of  reference  stems  from  a  desire  to  validate  its 
operational  significance.  Counterexamples  would  cast  some  doubts  as  to 
whether  the  particular  definition  indeed  captures  the  intuitive  meaning 
of  locality  of  reference. 

In  the  past  such  a  program  has  been  carried  out  for  a  number  of  key 
notions  of  traffic  engineering:  For  instance,  the  convex  stochastic  or¬ 
derings  were  shown  to  capture  the  notion  of  variability ,  in  the  process 
leading  to  various  proofs  that  “determinism  minimizes  waiting  times,” 
e.g.,  Baccelli  and  Makowski  (1989).  More  recently,  the  theory  of  mul¬ 
tivariate  stochastic  orderings  has  been  used  to  formalize  the  belief  that 
positive  correlations  lead  to  larger  buffer  levels  at  a  discrete-time  infi¬ 
nite  capacity  multiplexer  queue,  viz.  if  the  input  traffic  is  larger  than 
its  independent  version  in  the  supermodular  ordering,  then  their  corre¬ 
sponding  buffer  contents  are  similarly  ordered  in  the  increasing  convex 
ordering.  This  has  been  demonstrated  for  a  number  of  basic  traffic  mod¬ 
els  in  Vanichpun  and  Makowski  (2002). 

In  this  chapter  we  survey  and  extend  recent  results  by  the  authors 
concerning  a  formal  investigation  into  the  folk  theorems  mentioned  ear¬ 
lier,  albeit  in  a  simple  framework.  The  results  for  miss  rates  and  output 
streams  are  available  in  Vanichpun  and  Makowski  (2004a)  and  Vanich¬ 
pun  and  Makowski  (2004b),  respectively,  and  the  interested  reader  is 
referred  to  these  papers  for  additional  information.  In  the  next  section, 
we  provide  a  roadmap  to  the  viewpoint  we  have  adopted  and  to  the 
ensuing  results,  as  well  as  the  organization  of  the  chapter. 

2.  Navigating  the  chapter  -  A  roadmap 

2.1  Locality  via  popularity 

Our  first  task  consists  in  identifying  the  notion  of  locality  of  reference 
to  be  used  here.  We  begin  with  the  widely  accepted  observation  that  the 
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two  main  contributors  to  locality  of  reference  are  temporal  correlations 
in  the  streams  of  requests  and  the  popularity  distribution  of  requested 
objects.  To  describe  these  two  sources  of  locality,  we  assume  the  follow¬ 
ing  generic  setup  which  is  used  throughout:  We  consider  N  cacheable 
items  or  documents,  labeled  i  =  1, . . . ,  N,  and  we  write  Af  :=  {1, . . . ,  N}. 
The  successive  requests  arriving  at  the  cache  are  modeled  by  a  sequence 
{Rt,  t  =  0, 1, . . .}  of  jV-valued  rvs. 

1.  The  popularity  of  the  sequence  of  requests  {Rt,  t.  =  0,1,...}  is 
defined  as  the  pmf  p  =  ( p(i ), . . .  ,p(N))  on  Af  given  by 

1  *_1 

p{i)  :=  lim  -  1  [RT  =  i]  a.s.,  i  =  1, . . . ,  N  (1-1) 

t_>0°  T=0 

whenever  these  limits  exist  (and  they  do  in  most  models  treated  in  the 
literature). 

2.  Temporal  correlations  are  more  delicate  to  define  due  to  the  “cate¬ 
gorical”  nature  of  the  requests  {Rt,  t  =  0, 1, . . .}.  Indeed,  it  is  somewhat 
meaningless  to  use  the  covariance  function 

q(M)  :=  Cov[Rs,Rt],  s,t  =  0,1,.... 

as  a  way  to  capture  these  temporal  correlations  as  is  traditionally  done  in 
other  contexts.  This  is  because  the  rvs  {Rt,  t  =  0, 1, . . .}  take  values  in 
a  discrete  set.  We  took  {1, ...  ,N}  but  could  have  selected  any  set  of  N 
distinct  points  in  an  arbitrary  space.  Thus,  the  actual  values  of  the  rvs 
{Rt,  t  =  0, 1, , . .}  are  of  no  consequence,  and  the  focus  should  instead  be 
on  the  recurrence  patterns  exhibited  by  requests  for  particular  documents 
over  time.  The  literature  contains  several  metrics  to  do  this,  including 
the  inter-reference  time  of  Phalke  and  Gopinath  (1995),  the  working  set 
size  of  Denning  (1968)  and  the  stack  distance,  see  e.g.,  Almeida  et  al. 
(1996). 

We  shall  focus  exclusively  on  popularity  as  the  measure  of  locality 
of  reference.  In  fact,  to  isolate  its  contribution,  we  deal  with  the  sit¬ 
uation  where  there  is  no  temporal  correlations  in  the  stream  of  re¬ 
quests  as  would  be  the  case  under  the  so-called  Independence  Refer¬ 
ence  Model  (IRM).  More  precisely,  under  the  IRM  with  popularity  pmf 
p  =  (p(  1), . . .  ,p(N)),  the  successive  requests  {Rf,  t  =  0, 1, . . .}  form  a 
sequence  of  i.i.d.  Af- valued  rvs,  each  distributed  according  to  the  pmf  p, 
i.e., 

P  [Rt  =  i\=p{i),  i  =  1, . . . ,  N  (1.2) 

for  all  t  =  0, 1, . . .  and  (1.1)  holds  with  the  given  pmf  p  by  the  Law  of 
Large  Numbers. 
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IRMs  do  display  locality  of  reference  even  though  there  is  no  temporal 
correlations.  This  is  best  appreciated  by  considering  the  limiting  cases: 
If  p  is  extremely  unbalanced  with  p  =  (1  —  5,  e, . . . ,  e)  (with  6  =  (N  — 
l)e),  a  reference  to  document  1  is  likely  to  be  followed  by  a  burst  of 
additional  references  to  document  1  provided  (N  —  l)e  <C  1  —  5.  It  seems 
natural  to  deem  this  situation  as  one  exhibiting  very  strong  locality  of 
reference.  The  exact  opposite  conclusion  holds  if  the  popularity  pmf 
p  were  uniform,  i.e. ,  p(  1)  =  ...  =  p(N)  =  -jf,  for  then  the  successive 
requests  {Rt,  t  =  0,1,...}  form  a  truly  random  sequence,  in  which  case 
there  is  no  locality  of  reference.  Thus,  the  skewness  of  p  appears  to 
act  as  an  indicator  of  the  strength  of  locality  of  reference  present  in  the 
stream,  under  the  intuition  that  the  more  “balanced”  the  pmf  p,  the 
weaker  the  locality  of  reference. 

2.2  Majorization  and  Schur-concavity 

As  we  restrict  ourselves  to  the  class  of  IRMs,3  the  question  naturally 
arises  as  to  whether  popularity  prnfs  can  be  compared  on  the  basis  of 
their  skewness  so  that  versions  of  the  folk  theorems  discussed  earlier  can 
be  established.  More  formally,  consider  two  IRMs  with  popularity  prnfs 
p  and  q  (on  M),  and  let  M(jp )  and  M(q)  denote  their  miss  rates  under 
some  cache  replacement  policy.  We  seek  a  way  to  formally  compare  the 
pmf  vectors  p  and  q,  with  the  interpretation  that  if  p  is  less  skewed  than 
q,  then  the  IRM  with  popularity  pmf  p  has  less  locality  of  reference  than 
the  IRM  with  popularity  pmf  q,  and  the  folk  theorem  on  miss  rates  holds 
as 

M(q)  <  M(jp).  (1.3) 

We  turn  to  the  concept  of  majorization  discussed  in  the  monograph  of 
Marshall  and  Olkin  (1979)  as  a  way  to  characterize  such  imbalance  in  the 
components  of  popularity  prnfs.  Motivated  by  our  earlier  discussion,  we 
say  that  the  IRM  with  popularity  pmf  p  has  less  locality  of  reference  than 
the  IRM  with  popularity  pmf  q  if  p  is  majorized  by  q,  written  p  -<  q. 
As  elegantly  demonstrated  in  the  monograph  of  Marshall  and  Olkin 
(1979),  this  notion  has  found  widespread  use  in  many  diverse  branches 
of  mathematics  and  their  applications.  What  is  more,  comparison  results 
such  as  (1.3)  can  now  be  explored  through  the  rich  and  structured  class 
of  monotone  functions  associated  with  majorization,  the  so-called  Schur- 


3This  may  not  be  too  much  of  a  limitation  given  that  the  IRM  is  the  most  basic  request  model; 
it  is  often  used  for  checking  various  properties,  see  e.g.,  Breslau  et  al.  (1999).  Moreover, 
recent  results  by  Jelenkovic  and  Radovanovic  (2003)  suggest  some  form  of  insensitivity  to  the 
statistics  of  streams  of  requests.  Of  course,  more  work  along  these  lines  is  needed. 
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convex/concave  functions.  In  fact,  the  comparison  (1.3)  is  essentially  a 
statement  concerning  the  Schur-concavity  of  certain  functionals. 

Within  this  framework,  if  p*  denotes  the  popularity  pmf  for  the  output 
from  the  cache,  then  the  folk  theorem  on  the  stream  of  misses  takes  the 
form 

P *  ~<  P •  (1-4) 

Both  statements  (1.3)  and  (1.4)  were  investigated  in  the  context  of  a 
cache  operating  under  a  demand-driven  replacement  policy  when  docu¬ 
ment  requests  are  modeled  according  to  the  IRM.  We  now  summarize 
some  of  the  findings. 

2.3  The  folk  theorems  under  RORA  policies 

In  Vanichpun  and  Makowski  (2004a)  and  Vanichpun  and  Makowski 
(2004b),  the  authors  have  shown  the  validity  of  both  statements  (1.3) 
and  (1.4)  for  a  number  of  policies,  namely  the  optimal  policy  Aq,  the 
random  policy  and  the  First-In/First-Out  (FIFO)  policy.  These  prop¬ 
erties  hold  in  all  circumstances,  i.e.,  for  an  arbitrary  popularity  pmf  for 
the  IRM  input  and  for  arbitrary  cache  sizes.  To  the  best  of  the  authors’ 
knowledge,  these  results  provide  the  first  formal  proof  of  the  folk  theo¬ 
rems.  In  this  chapter,  we  have  extended  these  positive  results  to  a  very 
large  class  of  replacement  policies,  known  as  Random  On-denrand  Re¬ 
placement  Algorithms  (RORA);  these  policies  generalize  the  policy  Aq, 
the  random  policy  and  the  FIFO  policy. 

2.4  Counterexamples  and  asymptotics 

However,  there  are  policies  for  which  the  comparisons  (1.3)  and  (1.4) 
do  not  always  hold.  One  such  policy  is  the  Least- Recently-  Used  (LRU) 
replacement  policy,  a  popular  self-organizing  eviction  policy.  Indeed,  we 
first  exhibit  situations  where  the  miss  rate  of  the  LRU  policy  is  larger 
when  selecting  an  IRM  with  a  more  balanced  popularity  pmf.  Yet,  when 
the  popularity  prnfs  are  Zipf-like,  simulations  show  that  the  comparison 
(1.3)  still  does  hold  for  the  LRU  policy.  We  formally  establish  this  fact 
only  in  the  limiting  regime  where  the  skewness  parameter  of  the  Zipf-like 
pmf  is  large,  i.e.,  highly  skewed. 

It  also  happens  that  the  LRU  policy  fails  to  reduce  locality  of  reference 
in  the  sense  of  (1.4).  We  explore  the  issue  through  counterexamples 
which  are  developed  within  the  class  of  Zipf-like  popularity  prnfs.  For 
this  class  of  input  prnfs,  we  identify  a  condition  involving  the  cache 
size  and  the  number  of  cacheable  documents  under  which  (1.4)  fails  to 
occur  at  large  enough  values  of  the  skewness  parameter  of  the  Zipf-like 
pmf.  Under  this  condition,  which  is  reasonably  satisfied  in  practice,  we 
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show  that  the  output  pmf  p *  may  not  exhibit  less  locality  of  reference 
than  the  input  pmf  p  when  the  latter  has  too  much  of  it  to  begin  with. 
Additional  simulations  were  carried  out  and  suggested  a  conjecture  as 
to  when  LRU  caching  indeed  reduces  locality  of  reference  with  Zipf-like 
input  prnfs.  All  indications  point  to  the  possibility  that  for  small  enough 
cache  sizes,  the  desired  comparison  of  p  and  p *  will  hold;  this  will  be 
the  subject  of  future  investigation. 

While  the  discussion  given  here  is  restricted  to  IRMs,  we  believe  that 
similar  results  may  hold  for  more  general  input  models. 

2.5  Organization 

The  chapter  is  organized  as  follows:  The  basic  model  of  cache  man¬ 
agement  is  given  in  Section  1.3.  The  miss  rate  and  output  of  a  cache 
are  discussed  in  Section  1.4  and  1.5,  respectively.  Majorization  and  the 
companion  notion  of  Schur-convexity  are  introduced  in  Section  1.6  and 
1.7,  respectively.  We  obtain  the  basic  comparison  results  for  the  output 
in  Section  1.8.  The  RORA  cache  policies  are  defined  in  Section  1.9, 
and  the  comparison  results  for  their  miss  rates  and  outputs  are  given  in 
Section  1.10  and  1.11,  respectively.  Zipf-like  distributions  are  discussed 
in  Section  1.12.  Comparison  results  for  the  miss  rate  and  output  under 
the  LRU  policy  are  collected  in  Section  1.13  and  1.14,  respectively. 

3.  Demand-driven  caching 

Consider  a  universe  J\f  =  {1, ...  ,N}  of  N  cacheable  documents.  The 
system  is  composed  of  a  server  where  a  copy  of  each  of  these  N  docu¬ 
ments  is  available,  and  of  a  cache  of  size  M  (1  <  M  <  N).  Documents 
are  first  requested  at  the  cache:  If  the  requested  document  has  a  copy 
already  in  cache  (i.e. ,  a  hit),  this  copy  is  downloaded  from  the  cache  by 
the  user.  If  the  requested  document  is  not  in  cache  (i.e.,  a  miss),  a  copy 
is  requested  instead  from  the  server  to  be  put  in  the  cache.  If  the  cache 
is  already  full,  then  a  document  already  in  cache  is  evicted  to  make  place 
for  the  copy  of  the  document  just  requested.  The  document  selected  for 
eviction  is  determined  through  a  cache  replacement  or  eviction  policy.4 

We  now  develop  below  a  mathematical  framework  to  address  some  of 
the  issues  discussed  in  this  chapter.  Additional  details  are  available  in 
the  monographs  by  Aven,  Coffman  and  Kogan  (1987)  and  by  Coffman 
and  Denning  (1973).  We  begin  with  some  notation  that  will  be  used 
repeatedly:  Let  A*(M;A7)  be  the  collection  of  all  unordered  subsets  of 
size  M  of  A f,  and  let  A (M ;  A f)  be  the  collection  of  all  ordered  sequences  of 


4 We  use  the  terms  interchangeably. 


M  distinct  elements  from  AT.  We  write  {«i , . . . , im}  (resp.  (*i, . . . , im)) 
to  denote  an  element  in  A*(M;AT)  (resp.  A(M;J\f)). 

3.1  A  simple  framework 

Consecutive  user  requests  are  modeled  by  a  sequence  of  A/’-valued  rvs 

{Rt,  t  =  0, 1, _ }.  For  simplicity  we  say  that  request  Rt  occurs  at  time 

t  =  0,1,,...  Let  St  denote  the  cache  just  before  time  t  so  that  St  is  a 
subset  of  Af  with  at  most  M  elements.  The  decision  to  be  performed 
according  to  the  eviction  policy  in  force  is  the  identity  Ut  of  the  document 
in  St  which  needs  to  be  evicted  in  order  to  make  room  for  the  request 
Rt  (if  the  cache  is  already  full). 

Demand-driven  caching  considered  here  is  characterized  by  the  dy¬ 
namics 


[  St  if  RtCSt 

St+i  =  {  St  +  Rt  X  Rt  &  St,\St\  <  M  (1.5) 

(  St-Ut  +  Rt  iiRt£St,\St\  =  M 

for  all  i  =  0,1,...,  where  \St\  denotes  the  cardinality  of  the  set  St,  and 
St  —  Ut  +  Rt  denotes  the  subset  of  {1, . . .  ,  N}  obtained  from  St  by  remov¬ 
ing  Ut  and  then  adding  Rt  to  it,  in  that  order.  These  dynamics  reflect 
the  following  operational  assumptions:  (i)  actions  are  taken  only  at  the 
time  requests  are  made,  hence  the  expression  demand-driven  caching; 
(ii)  a  requested  document  not  in  cache  is  always  added  to  the  cache  if 
the  cache  is  not  full  at  the  time  of  request;  and  (iii)  eviction  is  manda¬ 
tory  if  the  request  Rt  is  not  in  cache  St  and  the  cache  St  is  full,  i.e., 
\St\  =  M. 


3.2  Admissible  IRMs  and  reduced  dynamics 

Throughout  the  stream  of  requests  {Rt,  t  =  0,1,...}  is  modeled  ac¬ 
cording  to  the  standard  Independence  Reference  Model  (IRM)  with  pop¬ 
ularity  pmf  p  =  (p(l), . . .  ,p(N)).  To  avoid  uninteresting  situations,  it 
is  always  the  case  that 


P(*)>  0,  i  =  l,...,N.  (1.6) 

A  pmf  p  on  {1, . . . ,  N}  satisfying  (1.6)  is  said  to  be  admissible. 

Under  this  non-triviality  condition  (1.6),  every  document  will  even¬ 
tually  be  requested  by  virtue  of  (1.1).  Thus,  as  we  have  in  mind  to 
study  long  term  characteristics  under  demand-driven  replacement  poli¬ 
cies,  there  is  no  loss  of  generality  in  assuming  (as  we  do  from  now  on) 
that  the  cache  is  full,  i.e.,  for  all  t  =  0, 1, . . .,  we  have  \St\  =  M  and  (1.5) 


Comparing  strength  of  locality  of  reference 
simplifies  to 


9 


St+i 


St  if  Rt  G  St 

St-Ut  +  Rt  if  Rt#St. 


(1.7) 


3.3  Cache  states  and  eviction  policies 

The  decisions  {Ut,  t  =  0,1,...}  are  determined  through  an  eviction 
policy  and  several  examples  will  be  presented  shortly. 

Consider  a  given  eviction  policy  n.  We  assume  that  the  dynamics 
of  the  cache  can  be  characterized  through  the  evolution  of  suitably  de¬ 
fined  variables  {Ut,  t  =  0, 1, . . .}  where  fit  is  known  as  the  state  of  the 
cache  at  time  t.  The  cache  state  is  specific  to  the  eviction  policy  and 
is  selected  with  the  following  in  mind:  (i)  The  set  St  of  documents  in 
the  cache  at  time  t  can  be  recovered  from  fit',  (ii)  the  cache  state  Df+i 
is  fully  determined  through  the  knowledge  of  the  triple  (fit,  Rt,Ut)  in 
a  way  that  is  compatible  with  the  dynamics  (1.7);  and  (iii)  the  evic¬ 
tion  decision  Ut  at  time  t  can  be  expressed  as  a  function  of  the  past 
(f2o>  Ro,  U(),  ■  ■  ■ ,  fit- 1,  Rt- l,  Ut- 1,  fit.,  Rt)  (possibly  through  suitable  ran¬ 
domization),  i.e.,  for  each  t  =  0, 1, ...  ,  there  exists  a  mapping  irt  such 
that 

Ut  =  TTt.(flo,  Ro,  Uq,  •  •  • ,  flt.-i,  Rt-i,Ut~i,flt,  Rt',  ^t) 

where  the  rv  St  is  taken  independent  of  the  past  (floj  Ro,  ■  ■  ■ ,  Ut~i,flt,  Rt)- 
Collectively  the  mappings  {-k t,  t  =  0, 1, . . .}  define  the  eviction  policy  7 r. 

We  close  this  section  with  some  examples  of  eviction  policies  which 
have  been  discussed  in  the  literature,  see  e.g.,  the  monographs  by  Aven, 
Coffman  and  Kogan  (1987)  and  by  Coffman  and  Denning  (1973): 

According  to  the  random  policy ,  when  the  cache  is  full,  the  docu¬ 
ment  to  be  evicted  is  selected  randomly  from  the  cache  according  to  the 
uniform  distribution. 

Any  permutation  a  of  {1,...  ,N}  induces  an  ordering  of  the  docu¬ 
ments  by  considering  the  documents  a(l),a(2), . . .  ,a(N)  as  “ranked” 
in  decreasing  order.  This  ranking  of  the  documents  allows  us  to  de¬ 
fine  the  eviction  policy  Aa  as  follows:  When  at  time  t  =  0, 1, . . .,  the 
cache  St  is  full  and  the  requested  document  Rt  is  not  in  the  cache,  the 
policy  Aa  prescribes  the  eviction  of  the  document  Ut  given  by  Ut  = 
argrnax  (<t-1(j)  :  j  €  St).  The  documents  <r(l), . . . ,  a(M  —  1),  once 
loaded  in  the  cache,  will  remain  there,  and  in  the  steady  state,  the  cache 
under  the  policy  Aa  will  contain  the  documents  <r(l), . . .  ,  a(M  —  1). 

The  so-called  policy  Aq  is  associated  with  the  underlying  popularity 
pmf  p  of  the  request  stream,  and  evicts  the  least  popular  document  in 
the  cache,  i.e.,  Ut  =  arg  min  (p(j)  :  j  G  St)  for  each  t  =  0,1,....  This 
policy  Aq  coincides  with  the  policy  Aa*  associated  with  the  permutation 
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<7*  of  {1, . . . ,  N}  which  orders  the  components  of  the  underlying  pmf  p 
in  decreasing  order,  namely  p(<r*(l))  >  p(a*(2))  >  . . .  >  p(a*(N)). 

Under  the  random  policy  and  the  policies  Aa,  we  can  take  the  cache 
state  to  be  the  (unordered)  set  of  documents  in  the  cache,  i.e. ,  the  cache 
state  is  an  element  of  A*(M ;  Af)  and  fit  =  St  for  all  t  =  0,1,.... 

The  FIFO  policy  replaces  the  document  which  has  been  in  cache  for 
the  longest  time,  while  the  LRU  policy  evicts  the  least  recently  requested 
document  already  in  cache.  The  definitions  of  the  FIFO  and  LRU  poli¬ 
cies  necessitate  that  the  cache  state  be  an  element  of  A(M;J\f)  with  f \ 
being  a  permutation  of  the  elements  in  St  for  all  t  =  0,1,.... 

4.  The  miss  rate  of  a  cache 

A  standard  performance  metric  to  compare  various  caching  policies  is 
the  miss  rate  of  the  cache.  This  quantity  has  the  interpretation  of  being 
the  long-term  frequency  of  the  event  that  the  requested  document  is 
not  in  the  cache,  and  therefore  determines  the  effectiveness  of  a  caching 
policy. 

Under  a  cache  replacement  policy  7 r,  the  miss  rate  Mn(p)  is  defined 
as  the  a.s.  limit 


1  , 

Mn(p)  =  lim  -  1  [Rt  SU  a.s.  (1-8) 

f— KX)  t  ^ — J 
T=  1 

where  ST  denotes  the  set  of  documents  in  cache  operating  under  the 
replacement  policy  7r  at  time  r  when  the  input  to  the  cache  is  the  re¬ 
quest  stream  {Rt,t  =  0,1...}.  Almost  sure  convergence  in  (1.8)  (and 
elsewhere)  is  taken  under  the  probability  measure  on  the  sequence  of  rvs 
{Ut,  Rt,  Ut,  t  =  0, 1, . . .}  induced  by  the  underlying  IRM  with  popularity 
pmf  p  through  the  eviction  policy  7r. 

Under  most  cache  replacement  policies  of  interest,  the  limit  (1.8)  ex¬ 
ists  and  admits  a  simple  expression  under  the  assumption  that  the  a.s. 
limit 

1  .  * 

Qir(s\p)  =  lim  -  y'  1  [ST  =  s]  a.s.  (1-9) 

t—>oo  t  ^ J 

T— 1 

exists  for  each  element  s  in  A*(M;AA).  Although  the  limits  (1.8)  and 
(1.9)  are  often  constants  which  are  independent  of  the  initial  cache  state 
flo,  this  is  not  always  the  case  as  be  seen  in  the  discussion  of  RORA 
policies  in  Sections  1.9  and  1.10. 

Theorem  1.1  Consider  an  eviction  policy  ir  such  that  the  limits  (1.9) 
exist  under  the  IRM  with  popularity  pmf  p.  Then,  the  limit  (1.8)  exists 
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and  is  given  by 

N 

(p)  =  5^p(*) 

^  ^  Qn(s,p) 

(1.10) 

i— 1 

s£A*(M;N) 

=  E 

Q-jt(s-,p)^2p(i) 

(1.11) 

sEA*(M;AT)  i£s 


where  A* ( M ;  J\f)  denotes  the  set  of  elements  in  A*(M;Af)  which  do  not 
contain  i,  i.e.,  A*(M;Af)  :=  {s  =  {i±, . .  Am}  £  A*(M;AT)  :  i  fL  s}  . 

Theorem  1.1  is  a  standard  result  under  IRMs;  its  proof  can  also  be  found 
in  Vanichpun  (2004).  The  existence  of  the  limits  (1.9)  is  a  mild  assump¬ 
tion  which  is  satisfied  under  all  eviction  policies  of  interest  considered 
here  (and  in  the  literature).  Indeed,  under  the  IRM  with  popularity 
pmf  p,  the  sequence  of  cache  states  {fit,  t  =  0,1,...}  typically  form 
a  Markov  chain  over  a  finite  state  space,  and  standard  ergodic  results 
readily  yield  the  existence  of  the  limits  (1.9).  This  issue  will  be  briefly 
discussed  in  each  situation  at  the  appropriate  time. 

5.  The  output  of  a  cache 
5.1  Definitions 

Under  the  demand-driven  caching  operation  (1.7),  the  output  of  the 
cache  is  the  sequence  of  requests  that  incur  a  miss,  i.e.,  when  the  in¬ 
coming  request  cannot  find  the  desired  document  in  the  cache.  More 
precisely,  a  miss  occurs  at  time  t  if  Rt  is  not  in  St-  Thus,  we  define 
recursively  the  time  indices  {u^,  k  =  0, 1, . . .}  by 

U)  =  0;  z/fc+i  :=  vk  +  pk+i,  k  =  0,1,... 

with 

hk+i  ■  inf  {f  1,2,....  RUk_\_g  0  Si/k~\ i-^} 

where  we  use  the  convention  pk+i  =  oo  if  either  =  oo  or  if  is 
finite  but  the  set  of  indices  entering  the  definition  of  Pk+i  is  empty. 
With  5  denoting  an  element  not  in  J\f,  we  define  the  output  process 
{/?,£,  k  =  1,2,...}  simply  as 

._  /  Rvk  if  vk  <  oo 
k  '  }  8  if  z/j.  =  oo 

for  each  A;  =  1,2,....  The  requests  {Rif.,  k  =  1,  2, . . .}  are  those  requests 
among  {Rf,t  =  0,1,...}  which  incur  a  miss  and  which  get  forwarded  to 
the  server  (or  to  the  higher  level  cache  in  a  hierarchical  caching  system) . 
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The  statistics  of  the  output  stream  {/?},  fc  =  1,2,...}  are  determined 
by  the  statistics  of  the  input  stream  {Rt,t  =  0,1,...}  and  by  the  cache 
replacement  policy  ir  in  use.  We  are  interested  in  evaluating  the  popu¬ 
larity  pmf  p *  =  (p*(  1), . . .  ,p*(N ))  defined  by 

1  K 

P* r(*)  :=  lirn  77  X! 1 117*  =  *1  a-s-  (L12) 

fc=l 

for  each  i  =  1,2,...  ,N,  whenever  these  limits  exist. 


5.2  Finding  p* 

The  remainder  of  this  section  is  devoted  to  the  existence  and  form  of 
the  limits  (1.12). 


Theorem  1.2  Consider  an  eviction  policy  it  such  that  the  limits  (1.9) 
exist  under  the  IRM  with  popularity  pmf  p.  For  each  i  =  1, ...  ,N,  the 
limit  (1.12)  exists  and  is  given  by 


where  we  have  set 


Pl(i)  = 


™n(i\  p)  :  = 


p(i)mn(i;p) 


^  ^  Qn(s,p). 
s£A^(M;Af) 


(1.13) 


(1.14) 


A  proof  of  Theorem  1.2  is  given  in  Vanichpun  and  Makowski  (2004b). 
Note  that  the  existence  of  the  limits  (1.9)  implies 

mn(i;p)  =  J2  6™  \  H  1 

seA*(M-M)  \  r= l  / 

T=  1  seA* 

1  .  * 

=  ^lim  —  1  [i  0  ST]  a.s.  (1.15) 

T= 1 

for  each  i  =  1, . . . ,  A",  and  mn(i;p)  thus  represents  the  fraction  of  times 
that  document  i  will  not  be  in  the  cache.  This  quantity  is  determined 
by  the  popularity  pmf  p  of  the  input  to  the  cache  and  by  the  eviction 
policy  7T  in  use. 
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Inspection  of  (1.10)  and  (1.14)  reveals  that 

N 

^2p(i)mn(i;p)  =  Mn(p) 

i= 1 


and  this  leads  via  (1.13)  to  a  simple  connection  between  the  miss  rate 
of  an  eviction  policy  and  the  pmf  of  its  output  in  the  form 


Pl(i) 


p(i)mn(i-,p) 

Mi r(p) 


i  =  l,...,N. 


(1.16) 


Thus,  p*(i)  can  be  viewed  as  the  ratio  between  the  miss  rate  of  the  cache 
when  the  requested  document  is  i  and  the  overall  miss  rate  of  the  cache. 


6.  Majorization  —  A  primer 

The  concept  of  majorization  provides  a  powerful  tool  to  formalize 
statements  concerning  the  relative  skewness  in  the  components  of  two 
vectors,  viz.,  the  components  (aq, . . . ,  aqv)  of  the  vector  x  are  “more 
spread  out”  or  “more  balanced”  than  the  components  (yi,  ■  ■  ■  ,Vn)  of 
the  vector  y:  For  vectors  x  and  y  in  HN ,  we  say  that  x  is  majorized  by 
y,  and  write  x  -<  y,  whenever  the  conditions 

n  n 

n  =  \,2,...  ,n  (i-i7) 

i= 1  i=l 


Yhxi  =  ^Zyi  (L18) 

i= 1  i= 1 

hold  with  im  >  ip]  >  ...  >  x'r  y]  and  ym  >  y\2 1  >  . . .  >  y^  denoting 
the  components  of  x  and  y  arranged  in  decreasing  order,  respectively. 

We  begin  with  a  sufficient  condition  for  majorization  which  is  ex¬ 
tracted  from  the  discussion  in  Marshall  and  Olkin  (1979),  B.l,  p.  129. 

PROPOSITION  1.3  Let  x  and  y  be  distinct  elements  of  RjV  such  that 
(1.18)  holds.  Whenever,  x\  >  x2  >  . . .  >  xjy,  if  there  exists  some  k  = 
1, . . . ,  N  —  1  such  that  X{  <  iji,  i  =  1,-. . . ,  k,  and  Xi  >  yi,  i  =  k+1, . . . ,  N, 
then  the  comparison  x  -<  y  holds. 


The  following  sufficient  condition  for  majorization  will  be  useful  in  the 
sequel;  it  was  already  announced  in  Marshall  and  Olkin  (1979),  B.l.b, 
p.  129,  without  proof. 
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Theorem  1.4  Let  x  and  y  be  distinct  elements  of! RjV  such  that  (1-18) 
holds.  Whenever  x\  >  x'2  >  ...  >  xjy  >0,  and  the  ratios  i  = 

X-i 

1, . . . ,  N,  are  decreasing  in  i,  we  have  the  comparison  x  y. 

With  any  element  of  ITV  such  that  YliLi  xi  /  0)  we  associate  the 
normalized  vector  x  as  the  element  of  !RjV  defined  by 

N 

x  :  =  (^x;)_  (xi, . .  .,xN). 
i= 1 

With  this  notation  we  can  present  a  useful  corollary  to  Theorem  1.4. 

COROLLARY  1.5  Let  x  and  y  be  distinct  elements  of  !RiV  such  that 
Vi  >  0.  Whenever  xi  >  X2  >  . . .  >  xn  >  0,  and  the  ratios 
i  =  1, . . .  ,  N,  are  decreasing  in  i,  we  have  the  comparison  x  -<  y. 

The  following  reformulation  of  Corollary  1.5  is  used  in  the  sequel. 

Lemma  1.6  Let  x  and  y  be  distinct  elements  of  IR  V  such  that  Xj  >  0, 
i  =  1, . . . ,  N  and  yt  >0.  If  ^  ^  whenever  Xj  >  x,-  for  distinct 

■  *  ^  ■  J-  Xi  Xj  ^ 

i,j  =  1, . . . ,  N,  then  the  comparison  x  -<  y  holds. 

7.  Schur-convexity 

Key  to  the  power  of  majorization  is  the  companion  notion  of  mono¬ 
tonicity  associated  with  it:  An  R- valued  function  cp  defined  on  a  set  A 
of  1Ra  is  said  to  be  Schur-convex  (resp.  Schur-concave)  on  A  if 

V(x)  <  <p(y)  (resp.  <p(x)  >  <p{y)) 

whenever  x  and  y  are  elements  in  A  satisfying  x  -<  y.  In  other  words, 
Schur-convexity  (resp.  Schur-concavity)  corresponds  to  monotone  in¬ 
creasingness  (resp.  decreasingness)  for  majorization  (viewed  as  a  pre¬ 
order  on  subsets  of  !RjV). 

Let  <7  denote  a  permutation  of  {1, . . .  ,Nj.  With  any  element  x  in 
]RjV,  we  associate  the  permuted  vector  <j(x)  in  HN  through  the  relation 

<r(x)  (xa(l)  ?  *  ■  ■  5  xcy (N) )  • 

Let  {oi,  i  =  1, . . . ,  IV!}  be  a  given  enumeration  of  all  the  IV!  permuta¬ 
tions  of  {1, ... ,  IV};  this  enumeration  is  held  fixed  throughout  the  chap¬ 
ter.  A  subset  A  of  !RjV  is  said  to  be  symmetric  if  for  any  x  in  A,  the 
element  <Ji{x)  also  belongs  to  A  for  each  i  =  1, . . .  ,N\.  Moreover,  for 
any  subset  A  of  HN ,  a  mapping  <p  :  A  — >  1R  is  said  to  be  symmetric 
if  A  is  symmetric  and  for  any  x  in  A,  we  have  </?(<7j(x))  =  (p(x)  for 
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each  i  =  1, ...  ,N\.  If  the  mapping  <p  :  A  — ►  R  is  Schur-convex  (resp. 
Schur-concave)  with  symmetric  A,  then  cp  is  necessarily  symmetric  since 
a i(x)  ~<  x  -<  <Ji(x)  implies  p(oi(x))  =  ip(x)  for  each  i  =  1, . . .  ,N\. 

In  the  following,  we  have  collected  some  useful  technical  results  con¬ 
cerning  Schur  concavity.  As  in  Marshall  and  Olkin  (1979),  p.  78,  for  each 
M  =  1 , ...  ,N,  the  elementary  symmetric  function  Em,n  ■  Rn  -»•  R  is 
defined  by 

Em,n{x)  :=  ^2  xh  ■  ■  ■  xiM,  x  E  RA  .  (1.19) 


By  convention  we  write  Eq.n(x)  =  1  for  all  x  in  RA .  It  is  well  known  ( 
Marshall  and  Olkin  (1979),  Prop.  F.I.,  p.  78)  that  the  function  Em,n  is 
Schur-concave  on  1RA  for  each  M  =  0, 1, . . . ,  A T. 

The  following  result  is  due  to  Schur  (see  Marshall  and  Olkin  (1979), 
F.3,  p.  80)  and  will  be  key  to  a  number  of  proofs. 


PROPOSITION  1.7  For  each  M  =  1,...,N,  the  mapping  &m,n  ■  R+ 
1R,  given  by5 


®m,n(x) 


Em,n{x ) 

Em-i, n{x)  ’ 


x  €  RA 


is  increasing,6  symmetric  and  concave,  thus  increasing  and  Schur-concave 
on  R  A . 


With  vectors  t  and  x  in  RjV,  we  associate  the  element  t  ■  x  of  RA 
defined  by  t  ■  x  :=  (fixi, . . .  ,tjvX]y).  With  this  notation  we  can  state 

PROPOSITION  1.8  Assume  the  mapping  if  :  R+  — >  R  to  be  concave  and 
the  mapping  h  :  R  A !  — >  R  to  be  increasing,  symmetric  and  concave.  For 
any  non-zero  vector  t  in  Rw,  the  mapping  ift  :  R+  — >  R  defined  by 


ift{x)  :=  h(if(t  ■  a  i(*)),  oN\(x))),  x  G  R+ 


is  symmetric  and  concave,  thus  Schur-concave  on  R  A  . 


8.  Comparing  input  and  output 

Recall  that  we  have  in  mind  to  compare  the  strength  of  locality  of 
reference  in  two  streams  of  requests  through  a  majorization  ordering  of 


5For  x  in  Rtf  such  that  Em—i,n{x)  =  0,  then  Em,n{x)  =  0  and  we  set  <1 }m,n{x)  =  0  by 
continuity. 

6  Here,  increasing  means  increasing  in  each  argument. 
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their  popularity  prnfs.  The  next  result  constitutes  a  first  step  in  the 
process  of  comparing  input  and  output  popularity  prnfs. 

Theorem  1.9  Consider  an  eviction  policy  it  such  that  the  limits  (1.9) 
exist  under  the  IRM  with  popularity  pmf  p.  If  mn(i]p)  >  mn(j;p) 
whenever  p(i)mn(i;  p)  <  p(j)mn(j]p)  for  distinct  i,j  =  1,.  ..,1V,  then 
it  holds  that  p*  -<  p  provided  mn(i;  p)  >  0  for  each  i  =  1, ...  ,1V. 


Proof.  This  claim  is  a  simple  consequence  of  Lemma  1.6:  We  take 
y  =  p  and  x  given  by  x*  =  i  =  1, ...  ,1V.  Thus,  we  have 

x  =  p*  while  y  =  p,  and  the  requisite  nronotonicity  assumptions  hold.  ■ 


The  assumptions  of  Theorem  1.9  ensure  that  mn(i\p)  <  mn(j;p) 
and  p(j)  <  p(i)  occur  simultaneously  for  distinct  i,j  =  1, ...  ,1V.  This 
leads  to  defining  a  caching  algorithm  ir  as  good  if  for  every  admissible 
pmf  p ,  we  have  mn(i\p)  <  mn(j;p)  whenever  p(j )  <  p(i)  for  distinct 
i ,  j  =  1, . . . ,  N.  Thus,  a  caching  policy  which  satisfies  the  assumptions 
of  Theorem  1.9  is  necessarily  a  good  policy.  However,  as  we  shall  see  in 
the  case  of  the  LRU  policy,  this  by  itself  is  not  sufficient  to  ensure  that 
the  output  popularity  pmf  is  more  balanced  than  the  input  popularity 
pmf. 

Repeatedly  we  shall  encounter  output  prnfs  which  assume  the  generic 
form  used  in  Theorem  1.10  below. 


Theorem  1.10  Let  p  be  an  admissible  pmf  on  AT,  and  for  each  i  = 
1, . . . ,  N,  define  an  (N  —  1) -dimensional  vector 

P(l)  ■=  (p(  1),  •  •  •  ,P(1  ~  1  ),p(i  +  !),■••  ,p(N) )). 

For  each  M  =  1,2,...  ,1V  —  1,  the  pmf p*M  on  A f  defined  by 

p(i)EM,N-i(p W) 


Pm(*)  = 


Ef=  1  P(j)EM,N-l(P{j)) 


i  =  1, ...  ,1V 


(1.20) 


satisfies  the  comparison  p*M  p. 

A  proof  of  this  theorem  builds  on  Lemma  1.6  and  is  given  in  Vanichpun 
and  Makowski  (2004b). 


9.  Random  on-demand  replacement 

We  now  introduce  a  large  class  of  demand-driven  eviction  policies 
called  Random  On-demand  Replacement  Algorithms  (RORA).  This  class 
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of  policies  generalizes  many  well-known  caching  policies,  e.g.,  the  ran¬ 
dom  and  FIFO  policies,  as  well  as  the  optimal  policy  Aq.  Moreover, 
the  Partially  Preloaded  Random  Replacement  Algorithms  proposed  by 
Gelenbe  (1973)  form  a  subclass  of  RORAs. 

9.1  Defining  RORAs 

A  RORA  policy  follows  the  demand-driven  caching  rule  (1.7)  (under 
the  customary  assumption  that  the  cache  is  initially  full)  and  is  charac¬ 
terized  by  an  eviction/insertion  pnrf  r  which  we  organize  as  the  M  x  M 
matrix  r  =  (r^),  i.e.,  for  each  k,l  =  1,...  , M,  we  have  r^i  >  0  and 
i  Ef=i  rM  =  1-  The  RORA  associated  with  the  pnrf  matrix  r  is 
denoted  by  RORA(r). 

We  select  the  cache  state  £lt  at  time  t  to  be  an  element  (i\, . . .  ,%m ) 
of  A(M;A7)  with  the  understanding  that  document  if .,  k  =  1 
is  in  cache  position  k  at  time  t.  RORA(r)  implements  the  following 
eviction  rule:  Introduce  a  sequence  of  i.i.d.  rvs  {(A*,  Yt),  t  =  0,1,...} 
taking  values  in  {1, . . . ,  M}  x  {1 . . . ,  A/}  with  common  pnrf  r,  i.e.,  for 
each  t  =  0, 1, . . .,  we  have 

P  [(Xt,  Yt)  =  (. k ,  £)}  =  Tkt,  k,£  =  1, . . . ,  M. 

The  sequences  of  rvs  {(Xt,  Yt),  i  =  0,1,...}  and  { Rt ,  f  =  0,1,...}  are 
assumed  mutually  independent.  The  document  Ut  to  be  evicted  at  time 
t  is  given  by 

Ut  =  1  [Rti  St]  iXt. 

We  have  Ut  =  0  whenever  Rt  £  St,  in  which  case  no  replacement  occurs 
and  the  cache  state  remains  unchanged,  i.e.,  klt+i  =  IV 

Next,  if  Rt  £  St  and  ( Xt,Yt )  =  (k,£),  then  Ut  =  ik  (the  document 
at  position  k  is  evicted)  and  the  new  document  is  inserted  in  the  cache 
at  position  t.  If  k  <  l,  the  documents  ik+i,  ■  ■  ■ ,  A  are  shifted  down 
to  position  k,k  +  1  1  (in  that  order)  while  if  k  >  t,  the  docu¬ 

ments  ,  ik- 1  are  shifted  up  to  position  l  +  1, . . . ,  k  (in  that  order). 
When  k  =  £,  the  new  document  simply  replaces  the  evicted  document 
at  position  k. 

A  document  initially  at  position  i  in  the  cache  will  never  be  replaced 
if 

rki  =  0  for  k  <  i  <  £  and  £  <  i  <  k.  (1.21) 

If  we  use  row  i  and  column  i  to  partition  the  matrix  r  into  four  blocks, 
then  condition  (1.21)  expresses  the  fact  that  the  entries  in  the  northwest 
and  southeast  corners  all  vanish  (including  row  i  and  column  i).  Let  Er 
denote  the  set  of  cache  positions  with  the  property  that  any  document 
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initially  put  there  will  never  be  evicted  during  the  operation  of  the  cache, 
i.e., 

Hr  :=  {i  =  1, .  , M  :  Eqn.  (1.21)  holds  at  i}.  (1-22) 

Under  the  IRM  with  popularity  pmf  p ,  the  cache  states  = 

0, 1, . . .}  form  a  Markov  chain  on  the  state  space  A(M;  AA).  The  ergodic 
properties  of  this  chain  are  determined  by  whether  the  set  Er  is  empty  or 
not.  This  is  discussed  in  Lemmas  1.11  and  1.12  in  the  next  two  sections; 
they  are  established  in  Vanichpun  (2004). 

Throughout  this  discussion  we  always  assume  that  the  cache  size  M 
and  the  number  of  cacheable  documents  N  satisfy  M  + 1  <  N.  We  do  so 
in  order  to  avoid  technical  cases  of  limited  interest.  Indeed,  the  results 
here  are  still  valid  for  the  case  N  =  M  +  1,  but  require  slightly  different 
arguments.  We  refer  the  interested  reader  to  Vanichpun  (2004). 


9.2  Case  1 

The  set  Er  is  empty,  so  that  every  document  in  cache  is  eventually 
replaced,  i.e.,  for  each  i  =  1  ,...,M,  there  exists  a  pair  k,t  (possibly 
depending  on  i )  with  either  1  <  k  <  i  <  £  <  M  or  1  <  £  <  i  <  k  <  M 
such  that  rki  >  0.  Here  are  some  well-known  policies  which  fall  in 
this  case:  The  random  policy  corresponds  to  RORA(r)  with  r  given  by 
i'kk  =  for  each  k  =  1, ,  M.  The  FIFO  policy  also  belongs  to  RORA 
with  two  possibilities  for  r,  namely  =  1  or  tm i  =  1-  The  first  (resp. 
second)  choice  corresponds  to  the  cache  state  (i\, . . .  ,im)  being  loaded 
from  left  to  right  with  documents  ordered  from  the  oldest  to  the  most 
recent  (resp.  from  the  most  recent  to  the  oldest). 

In  this  case,  the  Markov  chain  {£lt,t  =  0,1,...}  is  ergodic  on  the 
state  space  A(M;jV);  its  stationary  distribution  exists  and  is  given  in 
the  following  lemma. 


Lemma  1.11  Assume  the  input  to  be  an  IRM  with  popularity  pmf  p. 
For  RORA(r)  with  T,r  empty,  the  cache  states  =  0, 1, . . .}  is  an 

ergodic  Markov  chain  on  the  state  space  A(M;  Af)  with  stationary  pmf 
on  A(M;Af)  given  by 


nr{s;p) 


1  X  \ 

lim  -  >  1  ILL-  =  si  ci.s. 

t — >oo  t. 


C(p)  1p{ii)p{h)  ■  ■  -p{iM) 


(1.23) 


for  every  s  =  (i\, . . .  ,zm)  hi  A(M;Af)  with  normalizing  constant 


C{p):=  p(*iM*2)"-p(*m)-  (1-24) 

(L,-4m)6A(M;A0 
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The  stationary  pmf  is  the  same  for  all  RORAs  in  Case  1. 

9.3  Case  2 

The  set  £r  is  not  empty,  and  some  documents,  once  put  in  cache, 
will  never  be  replaced  during  the  operation  of  the  cache,  i.e. ,  if  Hq  = 
(?'i, . . . ,  jj,/),  then  for  all  f  =  1,2,...,  with  Qt  =  (j i, . . . ,  j'm),  we  have 

jt  =  ie,  (1.25) 

Here  are  some  examples  of  RORA  policies  in  that  category:  As  pointed 
out  in  Section  1.3.3,  any  permutation  a  of  {1, . . .  ,  N}  induces  an  evic¬ 
tion  policy  Afj  which  evicts  the  “smallest”  document  in  cache  with  doc¬ 
uments  <r(l),  <t(2),  . . .  ,  cr(N)  “ranked”  in  decreasing  order.  The  docu¬ 
ments  cr(l), . . .  ,cr(M  —  1),  once  loaded  in  the  cache,  will  remain  there. 
This  behavior  can  be  recovered  through  the  RORA(r)  policy  with  ma¬ 
trix  r  of  the  form  =  1  for  some  k  =  1, . . . ,  M,  in  which  case  Y,r  has 
M  —  1  elements,  namely  {1, . . . ,  k  —  1,  k  +  1, . . . ,  M}.  If  the  documents 
<t(1),  . . . ,  o(M  —  1)  are  initially  put  in  cache  (i.e.,  preloaded)  at  the  other 
positions  £  A  k  in  Er,  this  RORA(r)  policy  will  behave  like  the  policy 
Aa  in  its  steady  state  regime.  The  steady  state  behavior  of  the  cache 
under  the  policy  Aq  introduced  in  Section  1.3.3,  is  that  of  the  RORA(r) 
above,  this  time,  the  preloaded  documents  being  the  M  —  1  most  popular 
documents. 

To  describe  the  long-run  behavior  of  the  cache  states  {Ht,t  =  0,l,...}, 
we  go  back  to  (1.25).  First,  with  initial  cache  state  so  =  (*i, . . .  Am)  in 
A(M;AT),  let  Er(so)  denote  the  set  of  initial  documents  with  positions 
in  Er,  i.e., 

Sr(«o)  :=  {U  ■  t  G  £r}.  (1-26) 

Next,  we  introduce  the  component 

A(r,s0)  :=  {(ji,---,jM)  G  A(M;A0  :  je  =  ie,  £  G  Er}.  (1.27) 

In  view  of  (1.25),  once  the  cache  state  is  in  A(r,  so),  it  remains  there 
forever.  In  fact  all  the  states  in  the  component  A(r,so)  communicate 
with  each  other,  and  this  set  of  states  is  closed  under  the  motion  of  the 
Markov  chain.  There  are  —  m)\  elements  in  A (r,  so)  and  there 

are  (^)m!  distinct  components  which  form  a  partition  of  A(M;  Af).  As  a 
result,  when  restricted  to  A(r,  so),  this  Markov  chain  is  irreducible  and 
aperiodic,  and  its  ergodic  behavior  can  be  characterized  as  follows: 

Lemma  1.12  Assume  the  input  to  be  an  IRM  with  popularity  pmf  p. 
For  RORA(r )  with  |Er  |  =  m  for  some  m  =  1, . . . ,  M  —  1,  and  initial 
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cache  state  sq,  the  cache  states  {O* ,  t  =  0, 1, . . .}  form  an  ergodic  Markov 
chain  on  the  component  A (r,  so).  In  particular  the  limit 

7 Tr:S0(s;p)  =  lim  -  Y  1  [flT  =  s]  a.s. 

t — >oo  t  L ' 

T=  1 

always  exists  for  every  s  =  (ii, ...  ,1m)  hi  A(M\J\[)  with 

*r,s0{s-,p)  =  Cr{p,s0)~l  p{h)  (1.28) 

ie^r(so) 

for  every  s  in  A(r,so),  and  n r,S0(s;p)  =  0  otherwise,  with  normalizing 
constant 

Cr{p,s0):=  Y  II  (L29) 

*iw)eA(r ,sq)  iigT.r  (so) 


10.  The  miss  rate  under  RORAs 
10.1  Case  1 

Fix  s  =  {i\ , . . . ,  }  in  A*(M;Af),  and  let  A(s|M;AT)  denote  the 

subset  of  A(M;AT)  defined  by 

A(s\M;J\f)  :=  {(ji ,  •  ■  ■ ,  Jm  )  €A(M;J\f)  :  =  {h,  •  •  •  ,*m}}  • 

By  Lemma  1.11,  the  limit  (1.9)  exists  and  is  given  by 

Qr{s;p )  =  Y  C(Pr'p(ji)p(j2)  ■  --pUm) 

Oiv  Jm)gA(s|M;A/") 

=  Cip^Ml  -p{h)p{i2)  ■  ■  -P^m)  (1.30) 

with  normalizing  constant  C(p)  given  by  (1.24).  The  last  equality  at 
(1.30)  follows  from  the  fact  that  there  are  M\  elements  in  A(s\M;Af). 

Using  (1.30)  in  conjunction  with  Theorem  1.1,  we  readily  conclude 
that  under  the  RORA(r)  policy  of  Case  1  the  miss  rate  (1.8)  exists 
as  a  constant  which  is  independent  of  the  initial  cache  state  so-  To 
acknowledge  this  fact,  we  simply  denote  this  limiting  constant  by  Mr(p). 
Specializing  (1.11)  leads  to 

Mr(p)  =  C(p)~1Ml  Y  p(h)---p(iM )  Y  P(*) 

{*1 !— 4m}G  A*(M;A/j 

=  C(p)~l(M  +  1)!  Y  p(h)  ■  ■  ■  p(iM+ i) 

{u,-4M+i}eA*(M+l;A0 

=  C(p)~1(M  +  1)!  •  EM+i,N(p) 


(1.31) 
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22  p{h)  ■  ■  ■  p{iM)  =  Ml  Y  p(h)---p(iM ) 

(ii  v..,2m)^A(M;A/’)  {i\ (M;J\f) 

=  M\-Em,n(p)-  (1-32) 

Combining  (1.31)  and  (1.32)  we  finally  get 

Mr(p)  =  ( M  +  1)  •  EM±1^p1  =  (M  +  1)$m+i,n(p)  (1-33) 

Em,n{P) 

and  a  straightforward  application  of  Proposition  1.7  yields 

Theorem  1.13  Under  the  RORA(r)  policy  with  T,r  empty,  for  admis¬ 
sible  pmfs  p  and  q  on  J\f ,  it  holds  that  Mr(q)  <  Mr(p )  whenever  p  -<  q. 

10.2  Case  2 

Consider  now  the  RORA(r)  policy  when  the  set  Er  is  not  empty,  say 
with  |Er  |  =  m  for  some  m  =  1, . . . ,  M  —  1,  and  let  the  cache  be  initially 
in  state  so  in  A(M;AT).  By  Lemma  1.12,  for  each  s  =  {ii,  ■  ■  ■  Cm}  in 
A*(M;A/")  the  limit  (1.9)  exists  and  is  given  by 

Qr,s0(slP)  =  22  (s'\p)  (1.34) 

s'eA(s|r,so) 

where  A(s|r,  so)  denotes  the  subset  of  A (r,  so)  defined  by 

A(.s|r,s0)  :=  {(ji,  •  •  •  ,3m)  G  A  (r,s0)  :  [ju-  ■  ■  ,3m}  =  {h,--  -Cm}}  ■ 

The  set  A(s|r,  so)  is  non-empty  if  and  only  if 

Sr(s0)  C  {*!,...  ,iM}  (1-35) 

so  that  Qr,s0{s]p)  =  0  whenever  this  inclusion  (1.35)  does  not  hold. 
With  this  in  mind  we  define 

A *(r,  so)  :=  {s  =  {ii, . . .  cm}  G  A*(M;  J\f)  :  Eqn.  (1.35)  holds  at  s}. 

Going  back  to  (1.28)  and  (1.29),  we  now  conclude  that  for  each  s  = 
{ii,  •  •  •  Cm}  in  A *(r,  so),  it  holds 

Qr,so  (s;  p)  =  22  Cr(p,s  o)_1  Yl  p(jt) 

(li,-JM)6A(s|r,s0)  j^Er(so) 

=  Cr(p,s0)~1(M  -  m)\  ■  p{u)  (1.36) 

it0^r  (so) 
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where  in  the  last  equality  we  combine  the  set  equality  {ji, . . .  ,Jm}  = 
{*i, . . .  ,«m}  with  (1.35),  and  then  made  use  of  the  identity  |A(s|r,  so)|  = 
(M  —  m)\. 

Now,  using  (1.36)  in  conjunction  with  Theorem  1.1,  we  see  that  under 
the  RORA(r)  policy  of  Case  2  the  miss  rate  (1.8)  exists  as  a  constant 
which  depends  on  the  initial  cache  state  so-  We  record  this  fact  in  the 
notation  by  denoting  this  limiting  constant  by  Mr(p;  so).  As  in  Case  1, 
specializing  (1.11)  leads  to 


Mr{p]s0) 

=  Cr(p,s0)~1(M  -  m)\  22  p(ie)  22  ^(*) 

{h,-..,*M}sA*(r ,so)  !£^Sj-(s o)  . i«} 

=  Cr{p,s0)-1(M  -m  +  1)!  •  EM-m+i,N(t  -p)  (1.37) 


where  the  element  t  in  HN  is  specified  by  t %  =  0  for  document  i  in  Sr(so) 
and  ti  =  1  otherwise.  Moreover,  by  the  same  arguments  as  in  Case  1, 
we  can  simplify  the  normalizing  constant  Cr(p,  sq)  as 

Cr(p,  so)  =  {M  -  m)\  ■  EM-m,N(t  ■  p).  (1.38) 


It  then  follows  from  (1.37)  and  (1.38)  that 


Mr(p;s0 ) 


Em— 


m+l,N 


(t'P) 


(M  —  m  +  1) 

'  P) 

(■ M  -m  +  l)$M-m+l,N(t  •  p). 


(1.39) 


Clearly,  the  documents  in  Sr(so)  do  not  contribute  to  the  miss  rate 
since  they  never  generate  a  miss  once  loaded  in  cache  -  This  is  regardless 
of  the  order  in  which  they  appear  in  the  cache  state  so-  This  intuitively 
obvious  fact  is  in  agreement  with  the  expression  (1.39)  from  which  we  see 
that  for  any  two  initial  cache  states  so  and  Sq  in  A(M;  Af)  with  £r(so)  = 
Er(s'0),  we  have  the  equality  Mr(p ;  so)  =  Mr(p\  Sq).  As  a  result,  we  shall 
find  it  appropriate  to  denote  this  common  value  by  Mr  ^r^(p). 

For  any  pmf  p  on  A f,  let  £*(p)  denote  the  set  of  the  m  most  popular 
documents  according  to  the  pmf  p.  Equipped  with  the  expression  (1.39), 
we  are  now  ready  to  establish  the  result  for  RORA  policies  in  Case  2. 


Theorem  1.14  Under  the  RORA(r)  policy  with  |£r|  =  m  for  some 
m  =  1, . . . ,  M  —  1,  for  admissible  pmfs  p  and  q  on  A f,  it  holds  that 


(1.40) 


whenever  p  -<  q. 
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Proof.  The  desired  result  will  be  established  if  we  can  show  that  the 
miss  rate  function  p  — >  Mrpr(So)(p)  as  given  in  (1.39)  is  Schur-concave 
whenever  so  is  selected  so  that  Sr(so)  =  E*(p). 

As  we  can  always  relabel  the  documents,  there  is  no  loss  of  generality 
in  assuming  p(  1)  >  p( 2)  >  ...  >  p(N),  whence  £*(p)  =  {1 
and  the  element  t  in  (1.39)  can  be  specified  as  t\  =  . . .  =  tm  =  0  and 
tm. |_i  =  ...  =  tjg  =  1 .  By  Proposition  1.7,  the  mapping  &M-m+i,N 
is  increasing  and  Schur-concave  on  1R+ ,  and  by  virtue  of  the  defining 
property  of  £*(p),  we  have 

=  (M  -m  +  1)  •  min  $M-m+i,iv(i  •  cr(p))- 

The  mapping  h  :  RiV!  — ■>  1R  :  y  — >  min  (yi, . . . ,  '!Jn\)  is  clearly  increas¬ 
ing,  symmetric  and  concave,  while  the  mapping  is  concave 

on  IR/y  by  Proposition  1.7.  Combining  these  facts  with  the  expression 
for  Mr^*(p)(p)  obtained  above,  we  conclude  by  Proposition  1.8  to  the 
Schur-concavity  (in  the  pmf  vector)  of  the  miss  rate  functional  (1.39) 
under  the  RORA  policy  when  Sr(so)  =  S*(p).  ■ 


11.  The  output  under  RORAs 

We  now  discuss  the  popularity  pmf  of  the  output  generated  under  the 
RORA  policies. 

11.1  Case  1 

As  we  invoke  Theorem  1.2,  we  can  make  use  of  the  expressions  (1.30) 
into  the  relation  (1.14).  For  each  i  =  1, . . . ,  N,  in  the  notation  of  Theo¬ 
rem  1.10,  this  yields 

mr{v,p )  =  Yl  C{p)~1M\-p(ii)p(i2)---p(iM ) 

seAt(M;A0 
Em,n{p ) 

where  the  last  equality  follows  from  (1.32). 

Reporting  (1.41)  back  into  (1.13),  we  conclude  that  the  popularity 
pmf  p*  of  the  output  produced  by  RORA(r)  policy  in  Case  1  is  indeed 
of  the  form  (1.20),  and  Theorem  1.10  gives  us 

Theorem  1.15  Under  the  RORA(r)  policy  in  Case  1,  it  holds  that 
Pr  ■<  P 
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When  M  =  1,  any  demand-driven  policy  ir  reduces  to  the  policy  that 
evicts  the  only  document  in  cache  if  the  requested  document  is  not  in 
cache.  Specializing  the  results  above,  we  find  that  the  output  pmf  p *  is 
given  by 


p£(*) 


p{i){l-p{i)) 


i  = 


(1.42) 


and  Theorem  1.15  immediately  leads  to 


Corollary  1.16  With  M  =  1,  under  any  demand-driven  replacement 
policy  it,  the  popularity  pmf  p*  of  the  output  is  given  by  (1-42),  and 
satisfies  p*  -<  p. 


11.2  Case  2 


Assume  |£r  |  =  m  for  some  m  =  1 , ,M  —  1,  and  let  the  cache  be 
initially  in  state  so-  The  pmf  7 r  on  £r(so)c  is  defined  as  the  conditional 
pmf  induced  by  p  on  Xr(so)c;  it  is  given  by 


vr(?:) 


P(i) 

SjeSr(s0)cPO)  ’ 


i  £  Sr(so)c- 


(1.43) 


For  all  i  in  £r(so),  it  is  clear  that  mr:SQ(i;p)  =  0  while  for  document  i 
not  in  £r(so),  with  the  expression  for  Qr,s0(s;p )  given  in  (1.36),  we  find 


mr,s0{i;P )  =  Y  Cr(p,  s0)  p(ie) 

sGA*(r,so):  itf:S 

_  EM-m,N(t ^  ■  p) 

E M —m,N •  p) 

_  EM—m,N—m—  l(7T®) 

Em  — m,N— 

where  the  element  t ^  and  t ®  of  RjV  are  specified  by  t}p  =  t^  =  0  for 

document  j  in  £r(so),  =  0,  t[2^  =  1  and  t^l>  =  tJ2>  =  1  whenever 

document  j/i  is  not  in  £r(so).  In  the  second  equality  we  made  use  of 
the  expression  (1.38). 

Combining  (1.44)  with  (1.13),  we  immediately  get 


«rfSr(s  o) 


(1.44) 


p£,so(*) 


o 

_ ;1  j  •'  !  S  M  :ri.X  m  ) _ 

Xo€E(s0)c  7r0)^'M-m,AT-m- l(7rO)) 


if  i  £  £(so) 
if  i  0  £(s0). 


(1.45) 


Since  so(i)  =  0  whenever  i  belongs  to  £r(so),  it  is  more  natural  to 
seek  a  comparison  between  p *  so  and  the  conditional  pmf  7 r. 
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Theorem  1.17  Under  the  RORA(r)  policy  with  |Er|  =  m  for  some 
m  =  1, . . . ,  M  —  1,  it  holds  that  Pr,So  A  7r- 

Theorem  1.17  is  essentially  the  same  as  Theorem  1.10.  We  immedi¬ 
ately  obtain  the  desired  result  upon  identifying  7r  and  Er(so)c  with  p 
and  N  in  Theorem  1.10,  respectively. 

12.  Zipf-like  pmfs 

It  has  been  observed  in  a  number  of  studies  that  the  popularity  dis¬ 
tribution  of  objects  in  request  streams  at  Web  caches  is  highly  skewed. 
In  Almeida  et  al.  (1996),  a  good  fit  was  provided  by  the  Zipf  distribu¬ 
tion  according  to  which  the  popularity  of  the  ith  most  popular  object  is 
inversely  proportional  to  its  rank,  namely  1/i. 

In  more  recent  studies  by  Breslau  et  al.  (1999)  and  by  Jin  and  Bestavros 
(2000a),  “Zipf-like”  distributions7  were  found  more  appropriate;  see 
Breslau  et  al.  (1999)  (and  references  therein)  for  an  excellent  summary. 
Such  distributions  form  a  one-parameter  family.  In  our  set-up,  we  say 
that  the  popularity  pmf  p  of  the  M- valued  rvs  {Rt,  t  =  0,1,...}  is 
Zipf-like  with  parameter  a  >  0  if 

•-a  N 

P (i)  =  J  ^ i  =  with  Ca(N)  :=Y^i~a.  (1.46) 

The  pmf  (1.46)  will  be  denoted  by  pa.  It  is  always  the  case  that  pa(  1)  > 
pa( 2)  >  . . .  >  pa(N).  The  case  a  =  1  corresponds  to  the  standard  Zipf 
distribution  and  as  studied  by  Breslau  et  al.  (1999),  the  value  of  a  was 
typically  found  to  be  in  the  range  0.64  —  0.83. 

Zipf-like  pmfs  are  skewed  towards  the  most  popular  objects.  As  a  — > 
0,  the  Zipf-like  pmf  approaches  the  uniform  distribution  u  while  as  a  — > 
oo,  it  degenerates  to  the  pmf  (1,0,...,  0).  Extrapolating  between  these 
extreme  cases,  we  expect  the  parameter  a  of  Zipf-like  pmfs  (1.46)  to 
measure  the  strength  of  skewness,  with  the  larger  a,  the  more  skewed 
the  pmf  pa.  The  next  result  shows  that  majorization  indeed  captures 
this  fact,  and  so  it  is  warranted  to  call  a  the  skewness  parameter  of  the 
Zipf-like  pmf. 

Lemma  1.18  For  0  <  a  <  /3,  it  holds  that  pQ  ~<  pp. 

Lemma  1.18  can  already  be  found  in  Marshall  and  Olkin  (1979),  B.2.b, 
p.  130,  and  is  an  easy  by-product  of  Lemma  1.6.  In  the  spirit  of  Lemma 


7Such  distributions  are  sometimes  called  generalized  Zipf  distributions. 
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1.18  and  the  aforementioned  folk  theorem  (1.3),  we  expect  the  miss  rate 
of  the  cache  replacement  policy  to  decrease  as  a  increases.  This  has  been 
shown  to  be  the  case  using  simulations  in  Gadde,  Chase  and  Rabinovich 
(2001). 

Zipf-like  prnfs  are  used  in  the  discussion  of  the  LRU  policy  in  the  next 
sections. 


13.  The  miss  rate  under  the  LRU  policy 

Under  the  IRM  with  admissible  popularity  pmf  p,  it  is  known  (Aven, 
Coffman  and  Kogan  (1987),  Thm.  9,  p.  130  and  Coffman  and  Den¬ 
ning  (1973),  Thm.  6.5,  p.  272)  that  the  LRU  cache  states  {tb,f  = 
0,1,...}  form  a  stationary  ergodic  Markov  chain  over  the  finite  state 
space  A(M;A/")  with  stationary  distribution  given  by 


ttlru(s;p) 


1  X 

/im  7  5-/ 1  [^T  =  a's ' 

T—  1 


p(h)  ■ 

fa)) 


(1.47) 


for  every  s  =  (*i,...,*m)  in  A(M;AT).  Consequently,  the  limit  (1.9) 
exists  for  each  s  =  {i\, . . .  ,im}  in  A*(M;  AT)  as 


Qlrv(s-,p) 


E 


p{j  i)  •  •  • pUm ) 

put)) 


(1.48) 


where  A(s|M;A7)  is  as  defined  in  Section  1.10.1. 

The  miss  rate  of  the  LRU  policy  under  IRM  can  then  be  evaluated 
from  (1.11)  as 


A7i.ru  (p) 


E 

(u,...,*m)GA(M;A/') 


p{k)  ■  --Pirn)  (l  -  E^iP(u)) 

nEfo^E Um) 


(1.49) 


13.1  A  counterexample 

Contrary  to  what  transpired  with  RORA  policies,  the  miss  rate  under 
the  LRU  policy  is  not  Schur-concave  in  general,  and  consequently  the 
folk  theorem  (1.3)  does  not  hold.  This  is  demonstrated  through  the 
following  example  developed  for  M  =  3,  N  =  4,  and  the  family  of  prnfs 


P(x,y)  =  (x,l  -2y-x,y,y),  0<y<- 
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Figure  1.1.  LRU  miss  rate  when  M  = 
3,  iV  =  4,  y  =  p( 3)  =  p(4)  =  0.05, 
p(l)  =  x  and  p( 2)  =  0.9  —  p(l) 


Figure  1.2.  LRU  miss  rate  when  M  = 
3,  N  =  4,  y  =  p(3)  =  p(4)  =  0.01, 
p(l)  =  x  and  p( 2)  =  0.98  —  p(l) 


with  x  in  the  interval  \\  —  y,  1  —  3y].  Under  these  constraints,  the  com¬ 
ponents  of  the  pmf  p(x,  y)  are  listed  in  decreasing  order  and  for  any 
given  y  it  holds  that  p(x,  y)  -<  p(x',  y)  whenever  x  <  x'  in  the  inter¬ 
val  [\  —  y,  1  —  3y] .  Therefore,  if  the  miss  rate  under  LRU  was  indeed 
a  Schur-concave  function  in  the  popularity  pmf,  we  would  expect  the 
functions  x  — ►  Mlru(p(®,  y))  to  be  monotone  decreasing  in  x  on  the 
interval  —  y,  1  —  3 y\. 

Figures  1.1  and  1.2  display  the  numerical  values  of  Mlru(p(^j  2/))  as 
a  function  of  x  with  y  =  0.05  and  y  =  0.01,  respectively;  this  was  done 
by  numerical  evaluation  of  (1.49).  In  both  cases,  the  miss  rate  of  the 
LRU  policy  is  not  monotone  decreasing  in  x  on  the  range  \\  —  y,  1  —  3 y\, 
with  the  trend  becoming  more  pronounced  with  decreasing  y.  In  short, 
the  miss  rate  is  not  Schur-concave  under  the  LRU  policy. 

13.2  LRU  with  Zipf-like  popularity  pmfs 

While  the  miss  rate  is  not  Schur-concave  under  the  LRU  policy,  the 
desired  monotonicity  (1.3)  is  nevertheless  true  in  an  asymptotic  sense 
when  the  popularity  pmf  is  restricted  to  the  class  of  Zipf-like  pmfs. 


Theorem  1.19  Assume  the  input  to  have  a  Zipf-like  popularity  pmf  pa 
for  some  a  >  0.  Then,  there  exists  a*  =  a*(M,  N )  >  0  and  A  >  0  such 
that  Mlru (P/i)  <  Afjjiu(pa)  whenever  a*  <  a  and  a  +  A  <  /3. 


This  result  is  a  byproduct  of  the  asymptotic  equivalence 


lim 

Ot — >oo 


Afhmj(Pa) 


=  2 


(M  +  l)-“ 


(1.50) 
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Figure  1.3.  LRU  miss  rate  when  the  Figure  1.4-  LRU  miss  rate  when  the 

input  has  a  Zipf-like  popularity  pmf  pa  input  has  a  Zipf-like  popularity  pmf  pa 

for  a  small  (0  <  a  <  1)  for  a  large  (a  >  1) 

established  in  Vanichpun  and  Makowski  (2004a).  We  have  also  carried 
out  simulations  of  a  cache  operating  under  the  LRU  policy  when  the 
input  has  a  Zipf-like  popularity  pmf  pa.s  The  number  of  documents  is 
set  at  N  =  1,  000  while  the  cache  size  is  M  =  100.  The  miss  rate  of  the 
LRU  policy  is  displayed  in  Figure  1.3  and  1.4  for  a  small  (0  <  a  <  1) 
and  a  large  (ct  >  1),  respectively.  It  appears  that  the  miss  rate  is  indeed 
decreasing  as  the  skewness  parameter  a  increases  across  the  entire  range 
of  a.  This  suggests  that  the  folk  theorem  on  miss  rates  probably  holds 
for  the  LRU  policy  when  the  comparison  is  made  within  the  class  of 
Zipf-like  popularity  prnfs,  hence  the  following 

Conjecture  1.20  For  arbitrary  cache  size  M  and  number  N  of  doc¬ 
uments,  the  function  a  — ►  Mlru(pq)  is  strictly  decreasing  on  [0,  oo). 


14.  The  output  under  the  LRU  policy 

With  the  expressions  (1.47)  for  the  stationary  distribution  of  the  LRU 
cache  state,  it  is  a  simple  matter  to  check  for  each  i  =  1, . . . ,  N,  that 


rilru(up) 


^2  7Flru(s;p) 

sSA  i(M;J\[) 


E 

seA;(M;A0 


p(h)  • 


(1.51) 


8  We  choose  simulations  over  numerical  evaluation  of  (1.49)  because  this  expression  is  not 
suitable  for  numerical  evaluation  due  to  a  combinatorial  explosion. 
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where  Aj(M;Af)  denote  the  set  of  elements  in  A(M;  jV)  which  do  not 
contain  i,  i.e. ,  Aj(M;AT)  :=  {s  =  (zi,...zm)  £  A(M;AT)  :  i  fL  s}. 
Theorem  1.2  then  gives  the  output  popularity  pmf  in  the  form 

Pi A)  y-  p(h)  ■  ■  ■  p(iM)  52, 

J&.M,  nlh'a  -  T.U  m) 

for  each  i  =  1, . . . ,  N,  as  we  make  use  of  (1.16).  We  begin  with  a  positive 
result. 

Lemma  1.21  The  LRU  policy  is  a  good  policy. 

In  what  follows,  let  p*  denote  the  popularity  pmf  of  the  output  in¬ 
duced  by  an  input  with  Zipf-like  popularity  pmf  pa  (instead  of  the  more 
cumbersome  Plru  <*)• 

14.1  Another  counterexample 

In  view  of  Lemma  1.21,  it  is  tempting  to  expect  that  the  majorization 
comparison  Plru  ^  P  a^so  holds  under  the  LRU  policy.  This  is  not  the 
case  as  the  following  example  demonstrates:  With  M  =  3  and  N  =  4 
under  the  Zipf-like  popularity  pmf  (1.46)  with  a  =  3,  we  have  computed 
the  output  popularity  pmf  under  the  LRU  policy  using  (1.52).  The 
numerical  values  of  both  input  and  output  popularity  prnfs  are  presented 
in  Table  1.1. 

Table  1.1.  pa  and  p*  under  the  LRU  policy  when  the  input  distribution  is  Zipf-like 
with  parameter  a  =  3 


i 

i 

2 

3 

4 

pa 

0.8491 

0.1061 

0.0314 

0.0133 

p*a 

0.0118 

0.2031 

0.3853 

0.3998 

By  the  definition  of  majorization  (1.17)-(1.18),  the  comparison  p* 
pa  requires 

min  Pa(i)  <  .  min  p*(i),  (1.53) 

1=1, ...,N  i=l,. ..,N 

in  clear  contradiction  with  Table  1.1,  and  therefore  does  not  hold.  On 
the  other  hand,  the  comparison  pa  p*  is  not  valid  either  since  it  calls 
for  the  unmet  requirement 

max  pa(i)  <  max  p*(i). 
i=l,...,N  _  i=l,...,N 


(1.54) 
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In  short,  pa  and  p*  are  not  comparable  in  the  majorization  ordering. 
This  situation  does  not  represent  an  isolated  incident  as  the  next  theorem 
shows;  its  proof  is  available  in  Vanichpun  and  Makowski  (2004b). 

Theorem  1.22  Assume  the  input  to  have  a  Zipf-like  popularity  pmfpa 
for  some  a  >  0.  If  the  number  of  documents  N  and  the  cache  size  M 
satisfy  the  condition  N  <  Ml,  then  under  the  LRU  policy,  there  exists 
a*  =  a*(M,  N )  such  that  p*  -<  pa  does  not  hold  whenever  a  >  a*. 

14.2  A  conjecture 

Theorems  1.15  and  1.17  were  valid  for  all  values  of  M  and  N,  and 
for  arbitrary  admissible  prnfs.  While  the  counterexamples  discussed  ear¬ 
lier  dash  our  hope  to  get  an  analogous  result  for  the  LRU  policy,  the 
possibility  remains,  fueled  by  Corollary  1.16,  that  the  positive  result  is 
nevertheless  valid  in  some  appropriate  range  of  the  parameters  M  and 
N.  We  now  explore  this  issue  still  with  Zipf-like  popularity  prnfs  (1.46). 

Conjecture  1.23  Assume  that  the  popularity  pmf  is  the  Zipf-like  pmf 
(1-46)  with  a  >  0.  For  each  N  =  1,2, . . .,  there  exists  an  integer  M*  = 
M*(a;  N )  with  1  <  M*  <  N  such  that  p*  -<  pa  under  the  LRU  policy 
whenever  M  =  1, . . . ,  M*. 

In  support  of  this  conjecture,  we  have  carried  out  simulations  of  the 
cache  operating  under  the  LRU  policy  when  the  input  pmf  is  Zipf- 
like  with  parameter  a  =  0.8,1  and  2  and  with  N  =  1,000.  We  find 
the  output  popularity  prnfs  for  different  values  of  cache  size,  namely 
M  =  10, 50, 100, 500.  The  resulting  output  popularity  prnfs  in  the  orig¬ 
inal  order  of  documents  are  shown  in  Figure  1.5,  while  the  results  after 
rearranging  documents  in  the  decreasing  order  of  their  output  probabil¬ 
ities  are  displayed  in  Figure  1.6. 

From  Figure  1.6  (a),  when  a  =  0.8,  the  comparison  p*  -<  pa  holds 
for  M  =  10, 50.  Indeed,  from  their  respective  plots,  we  observe  that  the 
prnfs  pa  and  p*  when  arranged  in  decreasing  order  intersect  only  once, 
namely  p*([i])  <  pa(i),  i  =  l,...,k,  andp*([*])  >  pa(i),  i  =  k+1, 
for  some  k  =  1, . . . ,  N  —  1,  where  p*  ([1])  >  p*  ([2])  >  ...  >  PaiW])  are 
the  components  of  p*  arranged  in  decreasing  order.  This  is  the  sufficient 
condition  for  majorization  comparison  provided  in  Proposition  1.3. 

However,  for  a  =  0.8  and  M  =  100,  500,  despite  the  fact  that  in  Figure 
1.6  (a),  p*  looks  uniform  in  the  range  where  document  rank  is  smaller 
than  M,  the  comparison  p*  -<  pa  is  invalid  since  the  necessary  condition 
(1.53)  does  not  hold.  This  violation,  min^i^.^jvp*  (*)  <  pa(N),  can  be 
easily  seen  from  Figure  1.5  (a)  or  from  the  subplot  inside  Figure  1.6  (a). 
For  a  =  1  and  2,  by  the  same  arguments,  we  conclude  from  Figures  1.5 
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(b)-(c)  and  1.6  (b)-(c)  that  the  comparison  p*  -<  pa  holds  for  M  =  10 
but  does  not  hold  for  other  cache  sizes  M  =  50, 100, 500.  Hence,  these 
experimental  results  agree  with  Conjecture  1.23,  and  suggest  that  the 
value  of  M*(a;N )  in  Conjecture  1.23  decreases  as  a  increases. 
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Figure  1.5.  LRU  output  popularity 
pmf  with  different  cache  sizes  M  when 
the  input  has  a  Zipf-like  pmf  with  (a) 
a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2 


Figure  1.6.  LRU  output  popularity 
pmf  with  different  cache  sizes  M  when 
the  input  has  a  Zipf-like  pmf  with  (a) 
a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2.  Doc¬ 
uments  are  ranked  according  to  their 
probabilities. 
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